The CLaRK System: XML-based Corpora Development System for Rapid Prototyping
نویسندگان
چکیده
The paper presents the CLaRK System as a tool for the creation of XML-based corpora and a platform for rapid prototyping. The system provides a set of basic tools for processing XML documents. These tools include: tokenizers, regular grammars, constraints; remove, insert, extract, sort, transformation operations. Additionally, the system is equipped with a macro language which allows the creation of tools sequences. The macro language includes a set of control operators for guiding the application of the tools in the macro. Usually, a tool or a macro works over a single document changing it or producing a new document. In some cases processing of more than one document is necessary — in iterative statistics for treebank transformation, stand-off annotation, etc. For such processing the macro language allows a dynamic change of the processed documents.
منابع مشابه
Prototyping a Vibrato-Aware Query-By-Humming (QBH) Music Information Retrieval System for Mobile Communication Devices: Case of Chromatic Harmonica
Background and Aim: The current research aims at prototyping query-by-humming music information retrieval systems for smart phones. Methods: This multi-method research follows simulation technique from mixed models of the operations research methodology, and the documentary research method, simultaneously. Two chromatic harmonica albums comprised the research population. To achieve the purpose ...
متن کاملThe CLaRK System Tools XML-based Corpora Development
CLaRK is an XML-based software system for corpora development. It incorporates several technologies: XML technology; Unicode; Regular Cascaded Grammars; Constraints over XML Documents. The basic components of the system are: a tagger, a concordancer, an extractor, a grammar processor, a constraint engine.
متن کاملDevelopment of Corpora within the CLaRK System: The BulTreeBank Project Experience
CLaRK is an XML-based software system for corpora development. It incorporates several technologies: XML technology; Unicode; Regular Cascaded Grammars; Constraints over XML Documents. The basic components of the system are: a tagger, a concordancer, an extractor, a grammar processor, a constraint engine.
متن کاملCLaRK - an XML-based System for Corpora Development
In this paper we describe the architecture and the intended applications of the CLaRK System. The development of the CLaRK System started under the T ubingen-So a International Graduate Programme in Computational Linguistics and Represented Knowledge (CLaRK). The main aim behind the design of the system is the minimization of human intervention during the creation of corpora. Creation of corpo...
متن کاملFrom State to Structure: an XML Web Publishing Framework
We present the main features of a system designed to support the development and delivery of web applications through concepts for modularity, reuse and rapid prototyping. The system is based on an extended object-oriented database system that manages both application data and publishing information in terms of content, structure and presentation. The process of information delivery uses dynami...
متن کامل